• [arXiv] •
TALON is a novel approach for on-the-fly category discovery (OCD) that utilizes a test-time adaptation framework to continuously learn from unlabeled data streams. This repository contains the official implementation of our CVPR 2026 paper.
TALON/
├── config.py <- Dataset root paths & DINO pretrain path
├── train.py <- Training entry point
├── test.py <- Evaluation entry point
├── pyproject.toml <- Project metadata & dependencies (uv)
│
├── data/ <- Dataset loading modules
│ ├── cifar.py CIFAR-10 / CIFAR-100
│ ├── cub.py CUB-200-2011
│ ├── food101.py Food-101
│ ├── pets.py Oxford-IIIT Pet
│ ├── scars.py Stanford Cars
│ └── imagenet.py ImageNet-100
│
├── methods/ <- Model implementations
│ └── talon/
│ ├── model.py TALONModel (backbone + learnable prototypes)
│ ├── trainer.py Training loop, TTA, evaluation logic
│ └── utils.py NCM prototypes, logits, metrics
│
├── tools/ <- General utilities
│ ├── evaluate_utils.py Clustering accuracy (Hungarian assignment)
│ ├── losses.py Loss functions
│ └── train_utils.py SmoothedValue, training helpers
│
└── checkpoints/ <- Pretrained model weights (download below)
├── clip/{cub,food,scars}/
└── dino/{cub,food,scars}/
This project uses uv for dependency management to ensure a clean and reproducible environment.
Requirements:
- Python >= 3.12
- PyTorch (CUDA 12.4)
- OpenAI CLIP / timm (DINO ViT-B/16)
# 1. Clone the repository
git clone https://github.com/ynanwu/TALON
cd TALON
# 2. Install uv (if not already installed)
# See https://github.com/astral-sh/uv for details
# 3. Install all dependencies
uv sync
# That's it! Use `uv run` to execute any script — no need to manually activate the venv.We provide pretrained checkpoints for CUB, Food-101, and Stanford Cars using both CLIP and DINO backbones.
📥 Download from Google Drive or Hugging Face and place them as follows:
checkpoints/
├── clip/
│ ├── cub/
│ │ └── best_model.pth
│ ├── food/
│ │ └── best_model.pth
│ └── scars/
│ └── best_model.pth
└── dino/
├── cub/
│ └── best_model.pth
├── food/
│ └── best_model.pth
└── scars/
└── best_model.pth
Supported datasets and their known / novel class splits:
| Dataset | Total Classes | Known Classes | Novel Classes |
|---|---|---|---|
| CIFAR-10 | 10 | 6 | 4 |
| CIFAR-100 | 100 | 80 | 20 |
| CUB-200-2011 | 200 | 100 | 100 |
| Oxford-IIIT Pet | 37 | 19 | 18 |
| Stanford Cars | 196 | 98 | 98 |
| Food-101 | 101 | 51 | 50 |
| ImageNet-100 | 100 | 80 | 20 |
Configure the dataset root paths in config.py before running:
# config.py
CUB_ROOT = "datasets/CUB"
FOOD_101_ROOT = "datasets/Food101"
OXFORD_PET_ROOT = "datasets/OxfordPets"
SCARS_ROOT = "datasets/stanford_cars/"
CIFAR_10_ROOT = "datasets/CIFAR/"
CIFAR_100_ROOT = "datasets/CIFAR/"
IMAGENET_ROOT = "datasets/imagenet/"
# DINO backbone pretrained weights
pretrain_path = "dino_vitbase16_pretrain.pth"# CUB with CLIP backbone
uv run train.py --dataset_name cub --backbone clip --save_dir my_experiment --device cuda:0
# Food-101 with DINO backbone, custom tau and TTA
uv run train.py --dataset_name food --backbone dino --tau 0.8 --tta_state M+P --epochs 100 --device cuda:0
# Stanford Cars with CLIP, Model TTA only
uv run train.py --dataset_name scars --backbone clip --tta_state M --save_dir scars_exp --device cuda:0📋 Full list of training arguments
| Argument | Type | Default | Description |
|---|---|---|---|
--seed |
int | 1028 | Random seed |
--dataset_name |
str | cub |
pets / scars / cub / food / imagenet100 / cifar10 / cifar100 |
--backbone |
str | clip |
clip or dino |
--tta_state |
str | M+P |
M / P / M+P / none (see below) |
--tau |
float | 0.75 | Threshold for novel class detection (see below) |
--device |
str | auto | e.g. cuda:0 (auto-selects best GPU if empty) |
--save_dir |
str | test |
Directory for logs and checkpoints |
--train_batch_size |
int | 128 | Training batch size |
--eval_batch_size |
int | 64 | Evaluation batch size |
--num_workers |
int | 8 | DataLoader workers |
--prop_train_labels |
float | 0.5 | Proportion of labeled training data |
--epochs |
int | 100 | Total training epochs |
--start_epoch |
int | 0 | Resume from epoch |
--clip_grad |
float | None | Gradient clipping max norm |
tau controls the confidence threshold for novel class detection. When a test sample's maximum cosine similarity to all known prototypes is below tau, it is identified as a novel (unseen) class and a new prototype is created.
| Mode | What Gets Updated | Description |
|---|---|---|
M |
Backbone norm layers | Model TTA — fine-tunes the affine parameters (weight & bias) of LayerNorm/BatchNorm in the last transformer block. Minimizes entropy + maximizes instance-to-prototype similarity + inter-class repulsion. |
P |
Class prototypes | Prototype TTA — updates class prototype vectors via EMA based on test features. Adapts the classifier without touching the backbone. |
M+P |
Both | Joint TTA — applies both Model TTA and Prototype TTA simultaneously. Typically yields the best performance. (Recommended) |
none |
Nothing | No adaptation. Uses fixed model and prototypes. Useful as a baseline. |
# Evaluate CLIP on CUB (with default M+P TTA)
uv run test.py --dataset_name cub --backbone clip --ckpt_path checkpoints/clip/cub/best_model.pth
# Evaluate DINO on Food-101 with Prototype TTA only
uv run test.py --dataset_name food --backbone dino --tta_state P --ckpt_path checkpoints/dino/food/best_model.pth
# Evaluate without TTA (baseline)
uv run test.py --dataset_name scars --backbone clip --tta_state none --ckpt_path checkpoints/clip/scars/best_model.pthIf you find this work useful for your research, please consider citing our paper:
@inproceedings{talon2026,
title={TALON: Test-time Adaptive Learning for On-the-Fly Category Discovery},
author={Wu, Yanan and Yan, Yuhan and Chen, Tailai and Chi, Zhixiang and Wu, ZiZhang and Jin, Yi and Wang, Yang and Li Zhenbo},
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
year={2026}
}